I/O Overhead and Parallel VLSI Architectures for Lattice Computations

نویسندگان

  • Mark H. Nodine
  • Daniel P. Lopresti
  • Jeffrey Scott Vitter
چکیده

In this paper we introduce inputloutput (I/O) overhead . 1c, as a complexity measure for VLSI implementations of two-dimensional lattice computations of the type arising in the simulation of physical systems. We show by pebbling arguments that. 1c, = s2(n-') when there are n2 processing elements available. If the results are required to be observed at every generation, and no on-chip storage is allowed, we show the lower bound is the constant 2. We then examine four VLSI architectures and show that one of them, the multigeneration sweep architecture , also has I/O overhead proportional to n-l. We compare the constants of proportionality between the lower bound and the architecture. Finally, we prove a closed-form for the discrete minimization equation giving the optimal number of generations to compute for the multigeneration sweep architecture.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Sliding Memory Plane Array Processor

This paper describes a new mesh-connected SIMD architecture, called a Sliding Memory Plane (SIiM) Array Processor. On SIiM, the inter-processing element (inter-PE) communication, using the sliding memory plane, and the data input/output (I/O), using two U 0 planes, can occur without interrupting the PE’s, which greatly diminishes the communication and I/O overhead. SliM is unique in its ability...

متن کامل

Performance of VLSI Engines for Lattice Computations

We address t he problem of designin g an d building efficient custo m Vl.Sl-besed processors to do computat ions on large multi -dimensional lat tices. The design t ra deoffs for two architectures which provid e practical engines for lattice updates are deri ved and an alyzed . We find t hat I/O constit utes t he principal bottleneck of processors des igned for lat t ice computations, and we de...

متن کامل

Parallel Compensation of Scale Factor for the CORDIC Algorithm

The compensation of scale factor imposes significant computation overhead on the CORDIC algorithm. In this paper we present two algorithms and the corresponding architectures (one for both rotation and vectoring modes and the other only for rotation mode) to perform the scaling factor compensation in parallel with the classical CORDIC iterations. With these methods, the scale factor compensatio...

متن کامل

Scientiic Computing on Bulk Synchronous Parallel Architectures

Bulk synchronous parallel BSP architectures o er the prospect of achieving both scalable parallel performance and architecture independent parallel software They pro vide a robust model on which to base the future development of general purpose parallel computing systems In this paper we theoretically and experimentally analyse the e ciency with which a wide range of important scienti c computa...

متن کامل

Design and Implementation of a High Speed Systolic Serial Multiplier and Squarer for Long Unsigned Integer Using VHDL

A systolic serial multiplier for unsigned numbers is presented which operates without zero words inserted between successive data words, outputs the full product and has only one clock cycle latency. &#10The multiplier is based on a modified serial/parallel scheme with two adjacent multiplier cells. Systolic concept is a well-known means of intensive computational task through replication of fu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Computers

دوره 40  شماره 

صفحات  -

تاریخ انتشار 1990